Random Projection for Fast and Efficient Multivariate Correlation Analysis of High-Dimensional Data: A New Approach
نویسندگان
چکیده
In recent years, the advent of great technological advances has produced a wealth of very high-dimensional data, and combining high-dimensional information from multiple sources is becoming increasingly important in an extending range of scientific disciplines. Partial Least Squares Correlation (PLSC) is a frequently used method for multivariate multimodal data integration. It is, however, computationally expensive in applications involving large numbers of variables, as required, for example, in genetic neuroimaging. To handle high-dimensional problems, dimension reduction might be implemented as pre-processing step. We propose a new approach that incorporates Random Projection (RP) for dimensionality reduction into PLSC to efficiently solve high-dimensional multimodal problems like genotype-phenotype associations. We name our new method PLSC-RP. Using simulated and experimental data sets containing whole genome SNP measures as genotypes and whole brain neuroimaging measures as phenotypes, we demonstrate that PLSC-RP is drastically faster than traditional PLSC while providing statistically equivalent results. We also provide evidence that dimensionality reduction using RP is data type independent. Therefore, PLSC-RP opens up a wide range of possible applications. It can be used for any integrative analysis that combines information from multiple sources.
منابع مشابه
Combining brain imaging and genetic data using fast and efficient multivariate correlation analysis
Combining brain imaging and genetic data using fast and efficient multivariate correlation analysis Claudia Grellmann (Dissertation, 2017) Many human neurological and psychiatric disorders are substantially heritable and there is growing inter-est in searching for genetic variants explaining variability in disease-induced alterations of brain anatomy and function, as measured using neuroimaging...
متن کاملEstimating Returns to Scale in the Presence of Undesirable Factors in Data Envelopment Analysis
This research identifies returns to scale (RTS) of efficient decision making units (DMUs) with desirable (good) and undesirable (bad) inputs and outputs by presenting a new DEA (data envelopment analysis) approach. In this study, we first introduce a new input-output oriented model to determine efficient DMUs in the presence of undesirable factors and then, returns to scale of these DMUs are es...
متن کاملOnline Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...
متن کاملCalculation of One-dimensional Forward Modelling of Helicopter-borne Electromagnetic Data and a Sensitivity Matrix Using Fast Hankel Transforms
The helicopter-borne electromagnetic (HEM) frequency-domain exploration method is an airborne electromagnetic (AEM) technique that is widely used for vast and rough areas for resistivity imaging. The vast amount of digitized data flowing from the HEM method requires an efficient and accurate inversion algorithm. Generally, the inverse modelling of HEM data in the first step requires a precise a...
متن کاملSupervised Feature Extraction of Face Images for Improvement of Recognition Accuracy
Dimensionality reduction methods transform or select a low dimensional feature space to efficiently represent the original high dimensional feature space of data. Feature reduction techniques are an important step in many pattern recognition problems in different fields especially in analyzing of high dimensional data. Hyperspectral images are acquired by remote sensors and human face images ar...
متن کامل